Using Discourse Analysis to Improve Text Categorization in MEDLINE

نویسندگان

  • Patrick Ruch
  • Antoine Geissbühler
  • Julien Gobeill
  • Frédéric Lisacek
  • Imad Tbahriti
  • Anne-Lise Veuthey
  • Alan R. Aronson
چکیده

PROBLEM Automatic keyword assignment has been largely studied in medical informatics in the context of the MEDLINE database, both for helping search in MEDLINE and in order to provide an indicative "gist" of the content of an article. Automatic assignment of Medical Subject Headings (MeSH), which is formally an automatic text categorization task, has been proposed using different methods or combination of methods, including machine learning (naïve Bayes, neural networks..), linguistically-motivated methods (syntactic parsing, semantic tagging, or information retrieval. METHODS In the present study, we propose to evaluate the impact of the argumentative structures of scientific articles to improve the categorization effectiveness of a categorizer, which combines linguistically-motivated and information retrieval methods. Our argumentative categorizer, which uses representation levels inherited from the field of discourse analysis, is able to classify sentences of an abstract in four classes: PURPOSE; METHODS; RESULTS and CONCLUSION. For the evaluation, the OHSUMED collection, a sample of MEDLINE, is used as a benchmark. For each abstract in the collection, the result of the argumentative classifier, i.e. the labeling of each sentence with an argumentative class, is used to modify the original ranking of the MeSH categorizer. RESULTS The most effective combination (+2%, p<0.003) strongly overweights the METHODS section and moderately the RESULTS and CONCLUSION section. CONCLUSION Although modest, the improvement brought by argumentative features for text categorization confirms that discourse analysis methods could benefit text mining in scientific digital libraries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Critical Discourse Analysis Based Instruction to Improve EFL Learners’ Writing Complexity, Accuracy and Fluency

The literature of ELT is perhaps overwhelmed by attempts to enhance learners’ writing through the application of different methodologies. One such methodology is critical discourse analysis which is founded upon stressing not only the decoding of the propositional meaning of a text but also its ideological assumptions. Accordingly, this study was an attempt to investigate the impact of critical...

متن کامل

The categorization of “Iran’s handicrafts” at the intersection of “Westernism discourse” with “Orientalism discourse” in the Qajar period

This article intends to research the categorization and separation of “handicraft of Iran” in relation to the “Westernism” and “Orientalism” discourses in the Qajar discursive atmosphere by discourse analysis method and answers the questions of how “Handicrafts” is categorized at this intersection in the Qajar period, and how did these applied works changed into functional objects in the servic...

متن کامل

Automatic Text Categorization and Its Applicationto Text

We develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instancebased learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the e ectiveness of our categorization approach using two real-world document collections f...

متن کامل

Automatic Text Categorization and Its Application to Text Retrieval

ÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two realworld document collections...

متن کامل

Characterizing Online Discussion Using Coarse Discourse Sequences

In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of ov...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Studies in health technology and informatics

دوره 129 Pt 1  شماره 

صفحات  -

تاریخ انتشار 2007